{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "library(tidycensus)\n",
    "library(tidyverse)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Geography-language analysis using ACS 5-Year Data\n",
    "As mentioned in the above section, microdata at the individual-level is only available for PUMAs, which are fairly large geographic regions (there are 265 PUMAs in California[<sup>3</sup>](#fn3), in comparison to the 8057 census tracts and 23212 CBGs[<sup>4</sup>](#fn4)). Thus, in order to get a more granular relationship between an individual’s address and preferred language, we utilize the 2018 ACS 5-year estimates[<sup>5</sup>](#fn5), which have data at the tract and CBG levels.\n",
    "\n",
    "At the CBG level, the 5-year estimates contain language data for households at a less granular level compared to the tract-level data[<sup>6</sup>](#fn6). The estimates contain a count of Spanish-speaking households split up by “Limited English speaking household” or “Not a limited English speaking household”.\n",
    "\n",
    "Similar to the age-language analysis, the notebook code creates a lookup table CBG, filling the cells with the proportions of language or ESL speakers per geographic region. The regions are identified by their GEOIDs[<sup>7</sup>](#fn7).\n",
    "\n",
    "<span id=\"fn3\"><sup>3</sup>&nbsp;&nbsp;&nbsp;&nbsp;<https://www.census.gov/geographies/reference-maps/2010/geo/2010-pumas/california.html></span>\n",
    "\n",
    "<span id=\"fn4\"><sup>4</sup>&nbsp;&nbsp;&nbsp;&nbsp;<https://www.census.gov/geographies/reference-files/2010/geo/state-local-geo-guides-2010/california.html#:~:text=California%20has%208%2C057%20census%20tracts,block%20groups%2C%20and%20710%2C145%20blocks.></span>\n",
    "        \n",
    "<span id=\"fn5\"><sup>5</sup>&nbsp;&nbsp;&nbsp;&nbsp;<https://www.census.gov/data/developers/data-sets/acs-5year.html></span>\n",
    "\n",
    "<span id=\"fn6\"><sup>6</sup>&nbsp;&nbsp;&nbsp;&nbsp;<https://data.census.gov/cedsci/table?q=language%20spoken%20at%20home&g=0500000US06085&y=2018&d=ACS%205-Year%20Estimates%20Detailed%20Tables&tid=ACSDT5Y2018.C16002&hidePreview=true></span>\n",
    "\n",
    "<span id=\"fn7\"><sup>7</sup>&nbsp;&nbsp;&nbsp;&nbsp;<https://www.census.gov/programs-surveys/geography/guidance/geo-identifiers.html></span>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## CBG-language table construction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With the exception of `total`, the headers represent the subgroup of individuals/households who have limited English proficiency."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cbg_variables = c(\n",
    "        total = \"C16002_001\",\n",
    "        spanish = \"C16002_004\",\n",
    "        other_indo_european = \"C16002_007\",\n",
    "        asian_pacific_island = \"C16002_010\",\n",
    "        other = \"C16002_013\"\n",
    ")\n",
    "scc_language_data_cbg <- get_acs(\n",
    "    geography = \"block group\",\n",
    "    year = 2018,\n",
    "    state = \"CA\",\n",
    "    county = \"Santa Clara\",\n",
    "    key = \"c8aa67e4086b4b5ce3a8717f59faa9a28f611dab\",\n",
    "    survey = \"acs5\",\n",
    "    variables = cbg_variables\n",
    ")\n",
    "saveRDS(scc_language_data_cbg, \"../data/scc_language_by_cbg.rds\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "scc_language_data_cbg <- readRDS(\"../data/scc_language_by_cbg.rds\")\n",
    "\n",
    "language_preference_by_cbg <- data.frame()\n",
    "\n",
    "for(i in 1:(nrow(scc_language_data_cbg) / length(cbg_variables))) {\n",
    "    start_index = ((i - 1) * length(cbg_variables) + 1)\n",
    "    end_index = (i * length(cbg_variables))\n",
    "    cbg_bucket = data.frame(\n",
    "        GEOID = c(scc_language_data_cbg[start_index,]$GEOID),\n",
    "        NAME = c(scc_language_data_cbg[start_index,]$NAME)\n",
    "    )\n",
    "    \n",
    "    for(j in start_index:end_index) {\n",
    "        cbg_bucket[paste(scc_language_data_cbg[j,]$variable, \"estimate\", sep=\"_\")] <- scc_language_data_cbg[j,]$estimate\n",
    "        cbg_bucket[paste(scc_language_data_cbg[j,]$variable, \"moe\", sep=\"_\")] <- scc_language_data_cbg[j,]$moe\n",
    "        if (scc_language_data_cbg[j,]$variable != \"total\") {\n",
    "            cbg_bucket[paste(scc_language_data_cbg[j,]$variable, \"probability\", sep=\"_\")] <- cbg_bucket[paste(scc_language_data_cbg[j,]$variable, \"estimate\", sep=\"_\")] / cbg_bucket$total_estimate\n",
    "        }    \n",
    "    }\n",
    "    language_preference_by_cbg <- rbind(language_preference_by_cbg, cbg_bucket)\n",
    "}\n",
    "\n",
    "language_preference_by_cbg <- language_preference_by_cbg %>%\n",
    "    select(matches(\"*_estimate\")) %>%\n",
    "    reduce(`+`) %>%\n",
    "    mutate(language_preference_by_cbg, prob_esl = .) %>%\n",
    "    mutate(language_preference_by_cbg, prob_esl = (prob_esl - total_estimate) / total_estimate)\n",
    "language_preference_by_cbg"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "write_csv(language_preference_by_cbg, \"../data/language_preference_by_cbg.csv\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Visualize the count of CBGs by proportion of likely Spanish-speaking households.\n",
    "lang_by_cbg <- read_csv(\"../data/language_preference_by_cbg.csv\")\n",
    "spanish_cbg_hist <- hist(\n",
    "    lang_by_cbg$spanish_probability, \n",
    "    xlab=\"Proportion of Spanish households\", \n",
    "    ylab=\"Count of CBGs\",\n",
    "    main=\"Histogram of proportion of Spanish households by CBG\"\n",
    ")\n",
    "spanish_cbg_hist"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "R",
   "language": "R",
   "name": "ir"
  },
  "language_info": {
   "codemirror_mode": "r",
   "file_extension": ".r",
   "mimetype": "text/x-r-source",
   "name": "R",
   "pygments_lexer": "r",
   "version": "3.6.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
